lpufandomcom-20200214-history
Wiki Content
What is the LPU ? Goal of the LPU LPU stands for Linguistic Processing Unit. It is a software service that allows to manipulate, manage and process document corpora.The LPU requires understanding the concepts of corpus linguistics; it has a rather "technical" interface but does not necessarily require computer skills to work with. Basic concepts The first notion of the LPU is the "artefact": an artefact is anything that may be manipulated, observed from a linguistic point of view. Examples: a web page, a word document, a paragraph, a sentence, a word, a letter, a number, a set of words, etc. In order to correctly analyse an artefact, one requires to know what "kind" of artefact it is. Indeed, the same printed text (e.g. "never") has a different interpretation, depending on whether it is: the most frequent word in a document, the first adverb in a document, etc. This is where the LPU differentiates from other systems and tools. Usually, the "kind" or "type" of an object provides information about its representation or structure, e.g. a string, a number or a record. In the LPU, the "kind" of an artefact indicates in fact how to interpret it. Examples: a "number of words", a "first paragraph", a "most frequent adverb", etc. In the LPU, the "kind" of an artefact is called its "dimension" (we will see why in a minut); we do not say "the type of an artefact" but "the dimension to which the artefact belongs". The main usage of the dimensions is to provide different views about artefacts. A typical example is as follows: starting from a dimension of "documents"; then one might consider the dimension of the "length of a document", and the dimension of the "most frequent adverb of the document". An artifact from the "documents" dimension might be connected to an artefact in the "length" dimension (it's length) and to an artefact in the "most frequent..." dimension: it's most frequent adverb. Hence each artifact is connected to a number of other artefacts in other dimensions. In the LPU language, we say, it may be "projected" from one dimension to another. The LPU terminology came from an analogy with multi-dimensional spaces: given a point in space, in may be projected on the X axis, or the Y axis, etc. Points are artefacts, axes are dimensions. Refer to the complete list of LPU concepts . To complete the link with corpus linguistics, a "corpus of documents" is considered in the LPU as a collection of artefacts in a dimension. Read more about How the LPU works . Working with the LPU The LPU operates as a server that delivers services through a SOAP Web Service interface. Hence one needs a client software to connect to it. There is a sample command line client is provided, that may be used from a terminal in Linux, Mac OS/X and (probably) Windows. In fact on any computer with perl installed (and the SOAP::Lite perl module). However, using the Web Service interface, people might implement other client software, more user friendly. Using the basic Command line client For those interrested in technical details, read the LPU Web Service reference . Adding linguistic functionality to the LPU The LPU may obviously not implement all possible linguistic processin. Hence it was designed to be extended. The LPU may be extended with perl programs. There are two options: #Small and simple perl programs that may be integrated within the LPU itself; they are limited to a subset of perl (pl/perl) and do allow neither the loading of perl libraries; nor the access to the file system nor any network access; #Full perl programs, having access to any kind of computer resources, including files and network. Such programs might in turn establish a connection to any kind of linguistic software like (among others) POS Taggers (treetagger, cordial, etc.), statistical analysis tools, etc. There is some more detailed documentation about LPU Programming .